Applying the Subdue Substructure Discovery System to the Chemical Toxicity Domain
نویسندگان
چکیده
The ever-increasing number of chemical compounds added every year has not been accompanied by a similar growth in our ability to analyze and classify these compounds. The problem of prevention of cancer caused by many of these chemicals has been of great scientific and humanitarian value. The use of AI discovery tools for predicting chemical toxicity is being investigated. The basic idea behind the work is to obtain structure-activity representation (SARs)[Srinivasan et al.], which relates molecular structures to cancerous activity. The data is obtained from the U.S National Toxicology Program conducted by the National Institute of Environmental Health Sciences (NIEHS). A general approach to automatically discover repetitive substructures from the datasets is outlined by this research. Relevant SARs are identified using the Subdue substructure discovery system that discovers commonly occurring substructures in a given set of compounds. The best substructure given by Subdue is used as a pattern indicative of cancerous activity.
منابع مشابه
Substructure Discovery Using Minimum Description Length and Background Knowledge
The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our Subdue substructure discovery system based on the minimum description length principle. The Subdue system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previo...
متن کاملGraph-Based Hierarchical Conceptual Clustering
Hierarchical conceptual clustering has proven to be a useful, although under-explored, data mining technique. A graph-based representation of structural information combined with a substructure discovery technique has been shown to be successful in knowledge discovery. The SUBDUE substructure discovery system provides one such combination of approaches. This work presents SUBDUE and the develop...
متن کاملSubstructure Discovery in the SUBDUE System
Because many databases contain or can be embellished with structural information, a method for identifying interesting and repetitive substructures is an essential component to discovering knowledge in such databases. This paper describes the Subdue system, which uses the minimum description length (MDL) principle to discover sub-structures that compress the database and represent structural co...
متن کاملFuzzy Substructure Discovery
This paper describes a method for discovering substructures in data using a fuzzy graph match. A previous implementation of the Subdue system discovers substructures based on the psychologically-motivated criteria of cognitive savings, compactness, connec-tivity and coverage. However, the instances in the data must exactly match the discovered substructures. We describe a new implementation of ...
متن کاملGraph Based Concept Learning
Concept Learning is a Machine Learning technique in which the learning process is driven by providing positive and negative examples to the learner. From those examples, the learner builds a hypothesis (concept) that describes the positive examples and excludes the negative examples. Inductive Logic Programming (ILP) systems have successfully been used as concept learners. Examples of those are...
متن کامل